136 research outputs found

    Bot recognition in a Web store: An approach based on unsupervised learning

    Get PDF
    Abstract Web traffic on e-business sites is increasingly dominated by artificial agents (Web bots) which pose a threat to the website security, privacy, and performance. To develop efficient bot detection methods and discover reliable e-customer behavioural patterns, the accurate separation of traffic generated by legitimate users and Web bots is necessary. This paper proposes a machine learning solution to the problem of bot and human session classification, with a specific application to e-commerce. The approach studied in this work explores the use of unsupervised learning (k-means and Graded Possibilistic c-Means), followed by supervised labelling of clusters, a generative learning strategy that decouples modelling the data from labelling them. Its efficiency is evaluated through experiments on real e-commerce data, in realistic conditions, and compared to that of supervised learning classifiers (a multi-layer perceptron neural network and a support vector machine). Results demonstrate that the classification based on unsupervised learning is very efficient, achieving a similar performance level as the fully supervised classification. This is an experimental indication that the bot recognition problem can be successfully dealt with using methods that are less sensitive to mislabelled data or missing labels. A very small fraction of sessions remain misclassified in both cases, so an in-depth analysis of misclassified samples was also performed. This analysis exposed the superiority of the proposed approach which was able to correctly recognize more bots, in fact, and identified more camouflaged agents, that had been erroneously labelled as humans

    An automatic method for the lexical disambiguation of names

    Get PDF
    Este artículo presenta un método completamente automático que resuelve la desambiguación léxica de nombres calculando la densidad conceptual de cada uno de los sentidos del nombre a desambiguar. La evaluación del método se ha realizado sobre el corpus SemCor con un contexto de sólo dos nombres, obteniendo una precisión de 81.5% y un recall de 60.25%.Palabras clave: desambiguación léxica de nombres, densidad conceptual.This article presents a completely automatic method that solves the lexical disambiguation of names by calculating the conceptual density of each of the senses of the name to be disambiguated. The evaluation of the method has been carried out on the SemCor corpus with a context of only two names, obtaining an accuracy of 81.5% and a recall of 60.25%. Keywords: lexical disambiguation of names, conceptual density

    Un método automático para la desambiguación léxica de nombres

    Get PDF
    Este artículo presenta un método completamente automático que resuelve la desambiguación léxica de nombres calculando la densidad conceptual de cada uno de los sentidos del nombre a desambiguar. La evaluación del método se ha realizado sobre el corpus SemCor con un contexto de sólo dos nombres, obteniendo una precisión de 81.5% y un recall de 60.25%.Palabras clave: desambiguación léxica de nombres, densidad conceptual

    Soft ranking in clustering

    Get PDF
    Due to the diffusion of large-dimensional data sets (e.g., in DNA microarray or document organization and retrieval applications), there is a growing interest in clustering methods based on a proximity matrix. These have the advantage of being based on a data structure whose size only depends on cardinality, not dimensionality. In this paper, we propose a clustering technique based on fuzzy ranks. The use of ranks helps to overcome several issues of large-dimensional data sets, whereas the fuzzy formulation is useful in encoding the information contained in the smallest entries of the proximity matrix. Comparative experiments are presented, using several standard hierarchical clustering techniques as a reference

    Dietary potassium intake and risk of diabetes : a systematic review and meta-analysis of prospective studies

    Get PDF
    (1) Background: Dietary potassium intake is positively associated with reduction of cardiovascular risk. Several data are available on the relationship between dietary potassium intake, diabetes risk and glucose metabolism, but with inconsistent results. Therefore, we performed a meta-analysis of the prospective studies that explored the effect of dietary potassium intake on the risk of diabetes to overcome these limitations. (2) Methods: A random-effects dose–response meta-analysis was carried out for prospective studies. A potential non-linear relation was investigated using restricted cubic splines. (3) Results: A total of seven prospective studies met the inclusion criteria. Dose–response analysis detected a non-linear relationship between dietary potassium intake and diabetes risk, with significant inverse association starting from 2900 mg/day by questionnaire and between 2000 and 5000 mg/day by urinary excretion. There was high heterogeneity among studies, but no evidence of publication bias was found. (4) Conclusions: The results of this meta-analysis indicate that habitual dietary potassium consumption is associated with risk of diabetes by a non-linear dose–response relationship. The beneficial threshold found supports the campaigns in favour of an increase in dietary potassium intake to reduce the risk of morbidity and mortality. Further studies should be carried out to explore this topic

    A survey of kernel and spectral methods for clustering

    Get PDF
    Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many classical clustering algorithms, e.g., K-means, SOM and neural gas. Spectral clustering arise from concepts in spectral graph theory and the clustering problem is configured as a graph cut problem where an appropriate objective function has to be optimized. An explicit proof of the fact that these two paradigms have the same objective is reported since it has been proven that these two seemingly different approaches have the same mathematical foundation. Besides, fuzzy kernel clustering methods are presented as extensions of kernel K-means clustering algorithm. (C) 2007 Pattem Recognition Society. Published by Elsevier Ltd. All rights reserved

    Role of continuous glucose monitoring in diabetic patients at high cardiovascular risk. an expert-based multidisciplinary delphi consensus

    Get PDF
    Background: Continuous glucose monitoring (CGM) shows in more detail the glycaemic pattern of diabetic subjects and provides several new parameters (“glucometrics”) to assess patients’ glycaemia and consensually guide treatment. A better control of glucose levels might result in improvement of clinical outcome and reduce disease complications. This study aimed to gather an expert consensus on the clinical and prognostic use of CGM in diabetic patients at high cardiovascular risk or with heart disease. Methods: A list of 22 statements concerning type of patients who can benefit from CGM, prognostic impact of CGM in diabetic patients with heart disease, CGM use during acute cardiovascular events and educational issues of CGM were developed. Using a two-round Delphi methodology, the survey was distributed online to 42 Italian experts (21 diabetologists and 21 cardiologists) who rated their level of agreement with each statement on a 5-point Likert scale. Consensus was predefined as more than 66% of the panel agreeing/disagreeing with any given statement. Results: Forty experts (95%) answered the survey. Every statement achieved a positive consensus. In particular, the panel expressed the feeling that CGM can be prognostically relevant for every diabetic patient (70%) and that is clinically useful also in the management of those with type 2 diabetes not treated with insulin (87.5%). The assessment of time in range (TIR), glycaemic variability (GV) and hypoglycaemic/hyperglycaemic episodes were considered relevant in the management of diabetic patients with heart disease (92.5% for TIR, 95% for GV, 97.5% for time spent in hypoglycaemia) and can improve the prognosis of those with ischaemic heart disease (100% for hypoglycaemia, 90% for hyperglycaemia) or with heart failure (87.5% for hypoglycaemia, 85% for TIR, 87.5% for GV). The experts retained that CGM can be used and can impact the short- and long-term prognosis during an acute cardiovascular event. Lastly, CGM has a recognized educational role for diabetic subjects. Conclusions: According to this Delphi consensus, the clinical and prognostic use of CGM in diabetic patients at high cardiovascular risk is promising and deserves dedicated studies to confirm the experts’ feeling
    corecore